### A Pluto.jl notebook ###
# v0.20.4

using Markdown
using InteractiveUtils

# This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error).
macro bind(def, element)
    #! format: off
    quote
        local iv = try Base.loaded_modules[Base.PkgId(Base.UUID("6e696c72-6542-2067-7265-42206c756150"), "AbstractPlutoDingetjes")].Bonds.initial_value catch; b -> missing; end
        local el = $(esc(element))
        global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : iv(el)
        el
    end
    #! format: on
end

# ╔═╡ d1d6c1fa-d6f2-4ab5-9387-57d141b81fe3
# ╠═╡ show_logs = false
import Pkg; Pkg.activate(".")

# ╔═╡ c97d6e21-0c4c-4d79-8193-abc8e3001d0c
begin
	using Bootstrap
	using CairoMakie
	using CommonMark
	using Distributions
	using Format
	using HypothesisTests
	using Latexify
	using LaTeXStrings
	using Optim
	using PValue
	using PlutoUI
	using Random
end

# ╔═╡ 813ab26f-b80e-4fc2-8d95-357ccac051ed
begin
	using DSP
	
	fg = Figure()
	
	ax1 = Axis(fg[1, 1])
	
	bsfunc(x) = ifelse.(-0.5 .<= x .<= 0.5, 1., 0.)
	cvnx(l) = 3. .* (LinearIndices(l) / length(l)) .- 1.5
	
	bsr = range(start=-1.5,stop=1.5,step=0.001)
	
	a = bsfunc(bsr)
	b = conv(a,a)
	bn = b/maximum(b)
	c = conv(b,a)
	cn = c/maximum(c)
	d = conv(c,a)
	dn = d/maximum(d)
	
	
	
	lines!(bsr,a,label=L"f",linewidth=2)
	lines!(cvnx(bn),bn,label=L"f \ast f",linewidth=2)
	lines!(cvnx(cn),cn,label=L"f \ast f \ast f",linewidth=2)
	lines!(cvnx(dn),dn,label=L"f \ast f \ast f \ast f",linewidth=2)
	
	axislegend()
	
	fg
	
end

# ╔═╡ b95f212e-a904-425b-ab72-3a2a62a36051
md"""
**What is this?**


*This jupyter notebook is part of a collection of notebooks on various topics discussed during the Time Domain Astrophysics course delivered by Stefano Covino at the [Università dell'Insubria](https://www.uninsubria.eu/) in Como (Italy). Please direct questions and suggestions to [stefano.covino@inaf.it](mailto:stefano.covino@inaf.it).*
"""

# ╔═╡ cb5178e4-ca00-4d0a-8ae2-abc8aba79edc
md"""
**This is a `Julia` notebook**
"""

# ╔═╡ 90fa08da-6fb6-4a7a-ab84-82db43589cf6
Pkg.instantiate()

# ╔═╡ 782247c0-5683-45be-83dc-c640fc5af519
# ╠═╡ show_logs = false
md"""
$(LocalResource("./Pics/TimeDomainBanner.jpg"))
"""

# ╔═╡ 7609a27d-dcaf-400a-b07a-b2ce79d46546
md"""
# Statistics Reminder
***
"""

# ╔═╡ 2bcfe923-1fa1-4faa-96a4-4493ab10f607
md"""
## Probability

***

- We adopt here a Bayesian view: probability is just a degree of certainty about a statement.

- We describe probability following the so-called Bayesian system of probability. 

- An experiment is any action that can have a set of possible results where the actually occurring result cannot be predicted with certainty prior to the action.

The set $Ω$ of all outcomes of an experiment is known as the outcome space or sample space.

A well-balanced coin toss gives $Ω={H,T}$, and the inherent symmetries of the experiment leads to $P(H)=P(T)=0.5$.

"""

# ╔═╡ c3183c6f-74bb-4ad6-9e9c-3c4f3505a472
md"""
### Axioms of probability
***

- A probability space consists of the triplet ${Ω, F, P}$, a sample space, a class of events, and a function that assigns a probability to each event in $F$ following:

    - Axiom 1: $0 \le P(A) \le 1$, for all events A
    - Axiom 2: $P(\Omega) = 1$
    - Axiom 3: For mutually exsclusive (pairwise disjoint) events $A_1, A_2, ...,$



"""

# ╔═╡ aec2e7b5-d07b-4246-82bb-f1b6867261fe
cm"""

<div align="center">

``P(A_1 \cup A_2 \cup ...) = P(A_1) + P(A_2) + ... ``

</div>
"""

# ╔═╡ ee09bcaa-e455-44b6-9d77-26e73e3942e3
cm"""
- or, if for any ``i \neq j, A_i \cap A_j = \emptyset``, then
"""


# ╔═╡ 8d7236e1-452c-4623-a6dd-756c5eb53f5d
cm"""
```math
 P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i) 
```

"""

# ╔═╡ b9462e27-1bc9-4d92-ac01-efb604fe8eff
md"""
- From these, we have that for generic events, not necessarily disjoint, `C` and `D`, we have:

```math
P(C \cup D) = P(C) + P(D) - P(C \cap D) 
```

- A formal proof of the previous statement can be obtained observing that: 

```math
P(C \cup D) = P(C \cap D^c) +  P(C \cap D) + P(D \cap C^c) = 
```

```math
= P(C \cap D^c) +  P(C \cap D) + P(D \cap C^c) +  P(C \cap D) -  P(C \cap D) =
``` 

```math
= P(C \cap (D \cup D^c)) + P(D \cap (C \cup C^c)) - P(C \cap D) = 
```

```math
= P(C \cap \Omega) + P(\Omega \cap D) - P(C \cap D) =  P(C) + P(D) - P(C \cap D)
```


"""

# ╔═╡ 5f6217f3-7c8b-4feb-8150-af67660fe6a4
cm"""
### Conditional probability
***

- This is a far from trivial concept in probability:

```math
 P(A | B) \equiv \frac{P(A \cap B)}{P(B)} 
```

- The formula above can be read as "probability of `A` given `B` (i.e. knowing that `B` occurred)".

> E.g., rolling a dice, and with ``A={1,2,3}``, implies ``P(A)=1/2``. The probability of an even outcome (``B={2,4,6}``) is again ``1/2``, but ``P(A \cap B)`` is ``1/6``. Then, ``P(A | B) = 1/3``.   

"""

# ╔═╡ 85b2cb68-9d2b-484a-b979-0b81de743178
cm"""

### Bayes’ Theorem
***

- Introduced by Thomas Bayes, but recognized earlier by James Bernoulli and Abraham de Moivre, and later fully explicated by Pierre Simon de Laplace.

```math
 P(A | B) = \frac{P(B | A)~P(A)}{P(B)} 
```

   - `P(A|B)`: the **posterior**, or the probability of the model parameters given the data: this is the result we want to compute.
   - `P(B|A)`: the **likelihood**, the same function, but with a different meaning, used in the frequentist approach.
   - `P(A)`: the **model prior**, which encodes what we knew about the model prior to the application of the data.
   - `P(B)`: the data probability or **evidence**, in most cases just a normalization factor, but fundamental for model inference.

It is intriguing that most of the debate between "frequentists" vs "Bayesians" is not due to the mathematics of the theorem, but to its philosophical meaning, i.e. the basis of Bayesian inference.
"""

# ╔═╡ f3af6e72-7146-4eea-951f-37ee2f08cbff
md"""
> Before discussing Bayesian inference, we now recall some useful "frequentist" concepts and algorithms.

### Basic definitions and some useful “frequentist” tools
***

- *Random variable*: variable describing the possible output of an experiment. 
    - It can be discrete or continuous. 
    
- *Parent distribution*: distribution of values for a RV if the experiment is repeated number of times.


> These definitions are somehow reconsidered in a Bayesian scenario, but for now let's avoid these, interestiung actually, issues,

- *Mean* of parent distribution: $\mu \equiv \lim_{N\to\infty}(\frac{1}{N} \sum_i x_i)$ 
- *Median* of parent distribution: the median value, $\mu_{1/2}$ is where 50% of the input data are below and above the identified value. 
    - Computing the median requires to sort the data.
- *Mode* (or most probable value): $P(\mu_{max}) \ge P(x \ne \mu_{max})$.

> For symmetric distributions the mean, media and mode coincide.
"""

# ╔═╡ fbdfc6c9-dee0-43f2-8c50-a11ff74eafe6
cm"""
- Standard deviation (``\sigma``) and variance (``\sigma^2``) of the parent distribution:

```math
 \sigma^2 \equiv \lim_{N \to \infty} \left[ \frac{1}{N} \sum_i (x_i - \mu)^2 \right] 
```

- And standard deviation (``s``) and variance (``s^2``) of a sample population:

```math
 s^2 = \frac{1}{N-1} \sum_i (x_i - \mu)^2 
```

"""

# ╔═╡ 6a4cba4f-bf8a-497e-98ae-8b32503dc983
md"""
## Counting statistics
***

- With modern digital devices a large fraction of what we call "measurement" consists in counting discrete events (photons, etc.)
"""

# ╔═╡ 3b6a9f06-7bca-4745-8cce-c878462fd943
# ╠═╡ show_logs = false
md"""
$(LocalResource("./Pics/photoncounting.png"))
"""

# ╔═╡ 3879ebfd-2c2c-4339-bca4-4a8f09a7d10d
md"""
- Let's assume to count an average number of events $\mu$ in a given time interval $t$.
- And divide our time interval in a large number of sub-intervals $n$, so that the probability to detect one event in one sub-interval becomes $p \approx \mu/n$.

- Then, the probability to detect $k$ events in our time interval is given by the [Binomial probability distribution](https://en.wikipedia.org/wiki/Binomial_distribution):

```math
 P(k; n,p) = \binom{n}{k} p^k (1-p)^{n-k} = \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k} 
```

or

```math
 P(k; \mu) = \frac{n!}{k!(n-k)!}(\frac{\mu}{n})^k (1-\frac{\mu}{n})^{n-k} 
```

that, in the limit of large $n$, very low probability, becomes the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution):

```math
 P(k; \mu) = \frac{\mu^k}{k!} e^{-\mu} 
```
"""

# ╔═╡ 5ada7105-3a68-48a6-a559-a136e76584a8
md"""
### Binomial distribution
***

- Applicable when carrying out a well defined number of trials, $n$, each of which has probability $p$ of success.
    - Example: rolling $n=100$ dice, which is the probability of getting exactly $15$ sixes ($p = 1/6$)? $P(15,100,1/6) \approx 10$%.
"""

# ╔═╡ 273d70f2-6db2-42a9-bbc6-76c94d2d7f11
md"""
Number of trials: $( @bind n1 NumberField(1:100, default=50) ) 
"""

# ╔═╡ b18909e0-205e-40ad-ab87-7985d986cb9d
md"""
Probability of success of a single trial: $( @bind p1 NumberField(0:0.01:1, default=0.5) ) 
"""

# ╔═╡ 88012fd0-476a-46eb-9375-202ceccfc953
md"""
Number of successes: $( @bind x1 NumberField(1:n1, default=10) ) 
"""

# ╔═╡ abac4c88-2feb-463d-9bf3-48c56029548c
begin
	binomialDistribution1 = Binomial(n1,p1)
	
	probOfSuccesses1 = pdf(binomialDistribution1,x1)
end;

# ╔═╡ 6f8f4451-e403-4f22-a10b-64684a790fac
Markdown.parse("""
Probability of successes: $(latexify(100*probOfSuccesses1,fmt="%.2f"))%
""")

# ╔═╡ 4ca1d27b-6d94-4786-a429-40008570f48a
md"> Note that this is the probability of obtaining exactly the required number of successes!"

# ╔═╡ 6870814c-7eda-419a-86bc-bc8b4335b683
md"""
### Poisson distribution
***

- Applicable when the mean number of outcomes, $\mu$, is known or can be estimated and probability $p << 1$.
    - Example: About 1% of pregnancies are twin pregnacies. In $1000$ pregnancies, which is the probability of having exactly $5$ twin pregnancies? $P(5,10) \approx 4$%.
"""

# ╔═╡ 1e8ca5b8-a7d3-40f4-8683-d7a286f03ce5
md"""
Number of trials: $( @bind n2 NumberField(100:100:10000, default=1000) ) 
"""

# ╔═╡ c33a250d-c60a-4608-b74b-816a9770964a
md"""
Probability of success of a single trial: $( @bind p2 NumberField(0:0.01:0.1, default=0.01) ) 
"""

# ╔═╡ 6d7daebf-9ec3-462e-bafc-17e3d36201ec
md"""
Number of successes: $( @bind x2 NumberField(1:n1, default=5) ) 
"""

# ╔═╡ 8359d0d5-28cd-4afc-84e9-711b1da84d1f
begin	
	PoissonDistribution2 = Poisson(n2*p2)
	
	probOfSuccesses2 = pdf(PoissonDistribution2,x2)
end;

# ╔═╡ 4a59ebc1-2cdb-44a3-9088-f6bc96e733aa


# ╔═╡ 3e195d76-4ce2-4cc7-9106-fe0e7f728ed8
Markdown.parse("""
Probability of successes: $(latexify(100*probOfSuccesses2,fmt="%.2f"))%
""")

# ╔═╡ 878d2155-a4fe-4b51-9417-879aa259b52f
md"> Note again that this is the probability of obtaining exactly the required number of successes!"

# ╔═╡ 468e8a03-a11d-48ee-9699-1d10004363d2
begin
	fg1 = Figure()
	ax1fg1 = Axis(fg1[1, 1],
	    xlabel = "k",
	    ylabel = "P(k)",
	    title = "Binomial distributions"
	)
	
	binomialDistributionfgx11 = Binomial(2,0.5)
	probOfSuccessesfgx11 = pdf(binomialDistributionfgx11,0:10)
	barplot!(0:10,probOfSuccessesfgx11,label="p=0.5, n=2", gap = 0)
	#
	binomialDistributionfgx12 = Binomial(6,1/6)
	probOfSuccessesfgx12 = pdf(binomialDistributionfgx12,0:10)
	barplot!(0:10,probOfSuccessesfgx12,label="p=1/6, n=6", gap = 0)
	#
	binomialDistributionfgx13 = Binomial(10,0.9)
	probOfSuccessesfgx13 = pdf(binomialDistributionfgx13,0:10)
	barplot!(0:10,probOfSuccessesfgx13,label="p=0.9, n=10", gap = 0)
	#
	binomialDistributionfgx14 = Binomial(10,0.5)
	probOfSuccessesfgx14 = pdf(binomialDistributionfgx14,0:10)
	barplot!(0:10,probOfSuccessesfgx14,label="p=0.5, n=10", gap = 0)
	
	axislegend()
	
	fg1
end

# ╔═╡ da6b95d4-a640-483e-936a-45a8aee4f4cb
md"""
Average of Poissonian distribution: $( @bind pois1 PlutoUI.Slider(1:20, default=1) ) 
"""

# ╔═╡ f68bc201-d4d6-45a3-8ead-bf767276f657
begin
	fg2 = Figure()
	ax2fg2 = Axis(fg2[1, 1],
	    xlabel = "k",
	    ylabel = "P(k)",
	    title = "Poisson distributions"
	)
	
	PoissonDistributionfg2ax21 = Poisson(pois1)
	probOfSuccessesfg2ax21 = pdf(PoissonDistributionfg2ax21,0:20)
	barplot!(0:20,probOfSuccessesfg2ax21,label="μ="*string(pois1), gap = 0)
	#
	PoissonDistributionfg2ax22 = Poisson(5)
	probOfSuccessesfg2ax22 = pdf(PoissonDistributionfg2ax22,0:20)
	barplot!(0:20,probOfSuccessesfg2ax22,label=L"\mu=5", gap = 0)
	#
	PoissonDistributionfg2ax23 = Poisson(10)
	probOfSuccessesfg2ax23 = pdf(PoissonDistributionfg2ax23,0:20)
	barplot!(0:20,probOfSuccessesfg2ax23,label=L"\mu=10", gap = 0)
	
	axislegend()

	#ylims!(0,0.4)
	
	fg2
end

# ╔═╡ b0ecbb26-0822-4c56-8eaa-605f8f6cd59e
begin
	fg3 = Figure()
	ax1fg3 = Axis(fg3[1, 1],
	    xlabel = "k",
	    ylabel = "P(k)",
	    title = "Poisson distributions"
	)
	
	binomialDistributionfg3ax31 = Binomial(20,0.5)
	probOfSuccessesfg3ax31 = pdf(binomialDistributionfg3ax31,0:20)
	barplot!(0:20,probOfSuccessesfg3ax31,label=L"Binom. ($p=0.5, n=20$)", gap = 0)
	#
	binomialDistributionfg3ax32 = Binomial(50,0.2)
	probOfSuccessesfg3ax32 = pdf(binomialDistributionfg3ax32,0:20)
	barplot!(0:20,probOfSuccessesfg3ax32,label=L"Binom. ($p=0.2, n=50$)", gap = 0)
	#
	PoissonDistributionfg3ax33 = Poisson(10)
	probOfSuccessesfg3ax33 = pdf(PoissonDistributionfg3ax33,0:20)
	barplot!(0:20,probOfSuccessesfg3ax33,label=L"Poisson ($\mu=10$)", gap = 0)
	
	
	axislegend()
	
	fg3
end

# ╔═╡ 1a80b7ea-4515-44c4-9476-ba9b6eac30c5
md"""
## Mean and variance
***

- For any probability density function of a discrete variable $P(x_i)$:

    - Mean: $\mu = \sum_{i=0}^\infty x_i P(x_i)$,     
    - Variance: $\sigma^2 = \sum_{i=0}^\infty (x_i-\mu)^2 P(x_i)$
"""

# ╔═╡ b3163979-4422-4c47-b0f5-049c091ce2da
md"""

- For a binomial distribution:

```math
\mu = \sum_0^n k P(k; n, p)  = \sum_0^n k \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k} = np
```
"""

# ╔═╡ ea34b5a9-0937-4cbc-9afd-79055556e640
cm"""

- In fact: 
	- ``\mu = \sum_1^n k \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k} = \sum_1^n \frac{n!}{(k-1)!(n-k)!}p^k (1-p)^{n-k} =``
	- ``= \sum_1^n \frac{n(n-1)!}{(k-1)!(n-k)!}pp^{k-1} (1-p)^{n-k} = np \sum_1^n \frac{(n-1)!}{(k-1)!(n-k)!}p^{k-1} (1-p)^{n-k}``

- Then: ``\mu = np \sum_1^n \frac{(n-1)!}{(k-1)!((n-1)-(k-1))!}p^{k-1} (1-p)^{n-k} = np \sum_1^n k \binom{n-1}{k-1} p^{k-1} (1-p)^{n-k}``

- Let's rename ``n-1 \equiv m`` and ``k-1 \equiv j`` so that ``\mu = np \sum_j^m \binom{m}{j} p^{j} (1-p)^{m}``. 
- The summed term is just ``1``, i.e. the sum of all the possible outcomes of a binomial distribution. Thus: ``\mu = np``.
"""

# ╔═╡ 2c20004a-4929-401e-b653-23b471d9a487
cm"""

```math
 \sigma^2 = \sum_0^n (k - np)^2 \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k} = np(1-np) 
```
"""

# ╔═╡ 657c620c-b79f-4fb2-b78e-84c328d2e8ed
md"""
- In fact, $\sigma^2 = \mathbb{E}[k^2] - \mathbb{E}[k]^2 = \mathbb{E}[k^2] - [np]^2$ and we just need to compute the first term.
- As above, we can write the first term as: $\mu = \sum_1^n k^2 \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k}$, we now write $k^ 2 = k(k-1) + k$, so that we have:

$$\sum_1^n k^2 \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k} = \sum_1^n k(k-1) \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k} + \sum_1^n k \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k}$$

- The last term is again `np` and, again following the same procedure we applied before, we have that: $\sum_1^n k(k-1) \frac{n!}{k!(n-k)!}p^k (1-p)^{n-k} = n(n-1)p^2 \sum_2^n \frac{(n-2)!}{(k-2)!(n-k)!}p^{k-2} (1-p)^{n-k}$
- Following the procedure we applied for the *mean* above we, finally, have: $\mu = n(n-1)p^2 + np - (np)^2 = -np^2 + np = np(1-p)$

> In the limit of large ``n`` and small ``p``, i.e. Poisson distribution, $\sigma^2 = np = \mu$.
"""

# ╔═╡ dd1f7837-f591-4b45-951e-8d1eb2664c8c
md"""
## Error on the mean
***

- Mean of `N` measurements $x_1, x_2, ..., x_N$: $\bar{x} = \frac{1}{N} \sum_i x_i$

- Propagation of error on the mean: $\sigma^2_\bar{x} = (\frac{\partial \bar{x}}{\partial x_1})^2\sigma^2_{x_1} + (\frac{\partial \bar{x}}{\partial x_2})^2\sigma^2_{x_2} + ... = (\frac{1}{N})^2 \sigma^2_{x_1} + (\frac{1}{N})^2 \sigma^2_{x_2} + ... = \frac{1}{N^2} \sum_i \sigma^2_{x_i}$

    - If $\sigma_{x_1} = \sigma_{x_2} = ... =\sigma_i$, then  $\sigma^2_\bar{x} = \frac{N\sigma^2_i}{N^2} = \frac{\sigma^2_i}{N} \Longrightarrow \sigma_\bar{x} = \frac{\sigma_i}{\sqrt{N}}$
"""

# ╔═╡ 47347aea-bcd9-4329-89e0-0c2020df16fc
md"""
- Speaking of estimators, there are three properties to define a good estimator: unbiased, consistent, and efficient. 
    - The estimator is unbiased, when the expected value of the sample parameter is equal to the population parameter.
    - If the variance of the sample parameter decreases with the increasing sample size, the estimator is consistent.
    - With the same sample size, the estimator with lower variance is more efficient.
"""   

# ╔═╡ ed6701f4-9656-4f07-84c6-8e634cecea98
# ╠═╡ show_logs = false
md"""
$(LocalResource("Pics/Estimators.png"))
"""

# ╔═╡ 03744ccd-7a2b-41e7-a10f-ee1b9c0013c9
# ╠═╡ show_logs = false
md"""
- By a Probability Density Function the probability of the random variable falling within a particular range of values can be calculated.
    - If the probability density at a certain $x$ is denoted as $f(x)$. By applying integral function to $f(x)$ over a range of $(x_1,x_2)$, the probability of $x$ falling in $(x_1,x_2)$ can be calculated:
    
$(LocalResource("Pics/PDF.png"))
"""

# ╔═╡ 8dc68c2e-d1f0-475a-9c7f-d06210d09943
md"""
## The Normal distribution
****

- For large $\mu$ the Poisson distribution can be approximated by a Gaussian (with $\mu = \sigma^2$).

"""

# ╔═╡ 823f9f4c-0c19-40b6-bab0-4b5832d15a6d
cm"""
```math 
P_G(x; \sigma, \mu) = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-(x-\mu)^2}{2\sigma^2}} 
```
"""

# ╔═╡ ea9d2ebc-67b3-4945-b8ba-7598e5073b17
begin
	fg4 = Figure(size=(1280,960))
	
	
	ax1fg4 = Axis(fg4[1, 1],
	    xlabel = "k",
	    ylabel = "P(k)",
	    title = L"\mu = 5"
	)
	
	#
	PoissonDistribution = Poisson(5)
	probOfSuccesses = pdf(PoissonDistribution,0:15)
	barplot!(0:15,probOfSuccesses,label=L"Poisson", gap = 0)
	#
	NormalDistribution = Normal(5,sqrt(5))
	probOfSuccesses = pdf(NormalDistribution,0:15)
	barplot!(0:15,probOfSuccesses,label=L"Gaussian", gap = 0)
	
	axislegend()
	
	ax1fg4 = Axis(fg4[2, 1],
	    xlabel = "k",
	    ylabel = L"P_{Gauss}/P_{Poisson}",
	    title = "Gaussian / Poisson"
	)
	
	#
	PoissonDistribution = Poisson(5)
	PprobOfSuccesses = pdf(PoissonDistribution,0:15)
	#
	NormalDistribution = Normal(5,sqrt(5))
	NprobOfSuccesses = pdf(NormalDistribution,0:15)
	
	scatter!(0:15,NprobOfSuccesses ./ PprobOfSuccesses, markersize=20)
	
	ax1fg4 = Axis(fg4[1, 2],
	    xlabel = "k",
	    ylabel = "P(k)",
	    title = L"\mu = 100"
	)
	
	#
	PoissonDistribution = Poisson(100)
	probOfSuccesses = pdf(PoissonDistribution,0:150)
	barplot!(0:150,probOfSuccesses,label=L"Poisson", gap = 0)
	#
	NormalDistribution = Normal(100,sqrt(100))
	probOfSuccesses = pdf(NormalDistribution,0:150)
	barplot!(0:150,probOfSuccesses,label=L"Gaussian", gap = 0)
	
	axislegend()
	
	ax1fg4 = Axis(fg4[2, 2],
	    xlabel = "k",
	    ylabel = L"P_{Gauss}/P_{Poisson}",
	    title = "Gaussian / Poisson",
	)
	
	#
	PoissonDistribution = Poisson(100)
	PprobOfSuccesses = pdf(PoissonDistribution,0:150)
	#
	NormalDistribution = Normal(100,sqrt(100))
	NprobOfSuccesses = pdf(NormalDistribution,0:150)
	
	scatter!(0:150,NprobOfSuccesses ./ PprobOfSuccesses, markersize=20)
	
	ylims!(ax1fg4,0,5)
	xlims!(ax1fg4,50,150)
	
	fg4
end

# ╔═╡ 4bdf78e5-9c93-4727-b662-483ca9d2012a
cm"""
### Inverse distribution
***

- An interesting problem, having to deal with probability distributions, is how to compute the *inverse distribution*, i.e. the distribution of the reciprocal of a random variable.

- In general, given the probability distribution of a random variable `X` with strictly positive support, it is possible to find the distribution of the reciprocal, `Y = 1 / X`. 
    - If the distribution of `X` is continuous with density function `f(x)` and cumulative distribution function `F(x)`, then the cumulative distribution function, `G(y)`, of the reciprocal is found by:
    
```math
G(y) = Pr(Y \le y) = Pr(X \ge \frac{1}{y}) = 1-Pr(X < \frac{1}{y}) = 1 - F(\frac{1}{y})
```

- Therefore, the density function of `Y` is found as the derivative of the cumulative distribution function:

```math
g(y) = \frac{1}{y^2}f(\frac{1}{y}) 
```
"""

# ╔═╡ 1ac335ee-257a-40b3-8357-33466b1b1cbc
md"""
#### Exercise: the inverse of the uniform distribution
***

- If $X$ is uniformly distributed in a given interval $(a,b)$, with $a>0$, $f(x) = \frac{1}{b-a}$. 

- The distribution of the reciprocal $Y=1/X$ takes values in the range $(b^{-1},a^{-1})$, and turns out to be:

```math
g(y) = y^{-2} \frac{1}{b-a}
```
"""

# ╔═╡ 73b91ae5-e2ec-45c8-be23-7aed9a9e8b15
md"""
### The inverse CDF method
***

- We have introduced beforen the uniform random variable, i.e., a continuous random variable which takes on values from parameters $a$ to $b$.

- The continuous probability distribution of a uniform random variable $X$ is: $f(x) = \frac{1}{b-a} $. If $a=0$ and $b=1$ we have the so-called standard uniform and $f(x) = 1$.

- A cumulative distribution function (CDF) is the probability that a real-valued random variable $X$ with a given probability distribution is less than or equal to a quantity $x$. It is often denoted by $F(x) = P(X \le x)$.

- Some of the CDF propeerties are:

    1. The CDF is a non-decreasing function.
    2. $\lim_{x\to+\infty} F(x) = 1$
    3. $\lim_{x\to-\infty} F(x) = 0$

- The CDF of a uniform random variable $X$ in the interval $[a,b]$ is $F(x) = \frac{x-a}{b-a}$ since $F(x) = P(X \le x) = \int_a^x \frac{1}{b-a} dx = \frac{x-a}{b-a}$.
- The CDF of a standard random variabe is just $F(x) = x$.

\

- The *CDF method* allows one to generate any non-uniform random variables with known CDF starting from the uniform random case:

    1. Obtain or generate a draw (realization) $u$ from a standard uniform distribution.
    2. The draw $x$ from CDF $F(x)$ is given by $x = F^{-1}(u)$.
    

#### Exercise: A uniform distribution in the $(a,b)$ interval
***


- This is definitely a contrived example. Yet, the CDF of the distribution we want to sample is, as we know: $F(x) = \frac{x-a}{b-a}$. 
- Let's solve for $x$ in $F(x) = u$. We get $F^{-1}(u) = a + (b-a)u$. If we sample from $\mathcal{U} \sim (0,1)$ we get samples for $X$.


#### Exercise: Samples from the exponential distribution
***


- We want to generate an exponential random variable with $x>0$ and the rate lambda as $\lambda>0$. The continuous probability distribution of the exponential random variable is: $f(x) = \lambda e^{-\lambda x}$.
- Integrating $f(x)$ from $0$ we have: $F(x) = \int_0^x \lambda e^{-\lambda x} du = 1 - e^{-\lambda x}$.
- Solving for $x = F^{-1}(u)$ gives $x = -\frac{1}{\lambda} \ln (1-u)$.


#### Exercise: Samples from the Pareto distribution
***

- Now, we deal with the Pareto distribution. Given the shape parameter $k$ and the scale parameter $\lambda$, the Pareto distribution has PDF: $f(x) = \frac{k\lambda^k}{x^{(k+1)}}$.

- The CDF of the Pareto distribution, integrating rom $\lambda$ to $x$ is: $F(x) = P(X \le x) \int_\lambda^x \frac{k\lambda^k}{t^{(k+1)}} dt$. 

- Solving the integral we have: $F(x) = \frac{k\lambda^k t^{-k}}{-k}\Big|_{t=\lambda}^{t=x} = \frac{k\lambda^k x^{-k}}{-k} - \frac{k\lambda^k \lambda^{-k}}{-k} = -\lambda^k x^{-k} + 1 = 1-(\frac{\lambda}{x})^k$.

- Now, let's solve for $F(x) = u = 1-(\frac{\lambda}{x})^k$ and we get: $x = F^{-1}(u) = \frac{\lambda}{(1-u)^{1/k}}$.
"""

# ╔═╡ 5c946d0d-81f8-43f3-a0ab-196ea9423021
md"""
## Central limit theorem
***

> One of the most important results in statistics

- No matter what distribution (with finite mean and variance) our population follows, as we increase sample size, sampling distribution of the mean converges to a Normal distribution.

- One can also say that the convolution of a large number of (positive) functions $f_i(x)$ with variances $\sigma_i^2$ converges to a Gaussian with $\sigma^2 = \sum \sigma_i^2$.
    - The functions do not need to be Gaussian.

> Just pay attention since the CLT is more subtle than it seems! For interested readers a verbose but sufficiently clear proof is discussed [here](./open?path=Lectures/Lecture - Statistics Reminder/Lecture-CLTProof.jl).

### Convolution
***

- Mathematically, the convolution of two functions $f(x)$ and $g(x)$ is defined as:

```math
(f \ast g)(x) = \int_{-\infty}^{\infty} f(\xi) g(t-\xi) d \xi
```

- That, as we will see further in the course, it is convenienty carried out in the Fourier space:

```math
\mathcal{F}\{f \ast g\} = \mathcal{F}\{f\} \cdot \mathcal{F}\{f\}
```
"""

# ╔═╡ 6cdb0484-7ffc-41e3-8332-b5486ed5bd18
begin
	unimeanList = []
	normeanList = []
	poimeanList = []
	
	
	num_trials = 10000
	num_observations = 1000
	
	
	unidistr = Uniform(1,7)
	nordistr = Normal(0,1)
	poidistr = Poisson(1)
	
	for i in 1:num_trials
	    # sample from uniform distribution
	    uninumList = rand(unidistr,num_observations)
	    # sample from normal distribution
	    nornumList = rand(nordistr,num_observations)
	    # sample from poisson distribution
	    poinumList = rand(poidistr,num_observations)
	    #
	    push!(unimeanList,mean(uninumList))
	    push!(normeanList,mean(nornumList))
	    push!(poimeanList,mean(poinumList))
	end
	    
	    
	unifit = fit(Normal, Float64.(unimeanList))    
	norfit = fit(Normal, Float64.(normeanList))
	poifit = fit(Normal, Float64.(poimeanList))
	    
	#
	
	    
	fg5 = Figure(size=(1000,400))
	
	ax1fg5 = Axis(fg5[1, 1],
	    title="Uniform distribution"
	    )
	     
	hist!(unimeanList,normalization=:pdf)
	rng = range(start=minimum(unimeanList),stop=maximum(unimeanList),step=0.01)
	lines!(rng,pdf(unifit,rng),color=:red,linewidth=5)
	        
	ax1fg5 = Axis(fg5[1, 2],
	    title="Nornal distribution"
	    )
	    
	hist!(normeanList,normalization=:pdf)
	rng = range(start=minimum(normeanList),stop=maximum(normeanList),step=0.001)
	lines!(rng,pdf(norfit,rng),color=:red,linewidth=5)    
	    
	ax1fg5 = Axis(fg5[1, 3],
	    title="Poisson distribution"
	    )
	    
	hist!(poimeanList,normalization=:pdf)
	rng = range(start=minimum(poimeanList),stop=maximum(poimeanList),step=0.001)
	lines!(rng,pdf(poifit,rng),color=:red,linewidth=5)   
	    
	fg5
	    
end

# ╔═╡ f8c8b403-0684-4263-8aba-3e2f9c99f37a
md"""
## Maximum Likelihood Estimation
***

- For a given set of observations $x_i \{i=1...N\}$, and assuming a parent distribution $P(x;\xi_1, \xi_2, ...)$, let's find the set of parameters $\xi_1, \xi_2, ...$ so that the probability to observe the values $x_i$ is maximized.

- Let's assume a Gaussian parent distribution:

```math
P_G(x; \sigma, \mu) = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-(x-\mu)^2}{2\sigma^2}} 
```

- The total likelihood to observe the whole dataset turns out to be (with the, often hidden, assumption of statistical indipendenced of the data):

```math
\mathcal{L}(\mu, \sigma; x_i) = \prod_i P_G(x_i; \sigma, \mu) = \left( \frac{1}{\sigma \sqrt{2\pi}}\right)^N e^{(-\frac{1}{2} \sum_i \frac{(x_i-\mu)^2}{\sigma^2})}
```

- It is usually simpler working with the logarithm of the likelihood:

```math
\log \mathcal{L}(\mu, \sigma; x_i) = -\frac{N}{2} \log2\pi - \frac{N}{2} \log{\sigma^2} - \frac{1}{2\sigma^2} \sum_i (x_i-\mu)^2
```

- that is maximized at $\partial{P_G(\mu, \sigma)} / \partial{\mu} = 0$ and $\partial{P_G(\mu, \sigma)} / \partial{\sigma^2} = 0$, yielding the maximum likelihood estimators (MLE).

- For this problem, the solution can easily be found analytically:
    - $\partial{P_G(\mu, \sigma)} / \partial{\mu} \propto \sum_i (x_i-\mu) = 0$ when $\mu = \frac{1}{N}\sum_i x_i$. Not surprinsingly, exactly the definition of the sample mean!
    
```math
\partial{P_G(\mu, \sigma)} / \partial{\sigma^2} = -\frac{N}{2\sigma^2} - \frac{1}{2} \sum_i (x_i-\mu)^2 \frac{d}{d\sigma^2} (1/\sigma^2) = -\frac{N}{2\sigma^2} + \frac{1}{2} \sum_i (x_i-\mu)^2 \frac{1}{\sigma^4} =
```

```math
= \frac{1}{2\sigma^2} \left[ \frac{1}{\sigma^2} \sum_i (x_i-\mu)^2 - N \right]
```

- Then, if $\sigma^2 \ne 0$, can be $0$ if $\sigma^2 =  \frac{1}{N} \sum_i (x_i-\mu)^2$, i.e. the unadjusted sample variance.
    
- In general, however, there is no analytical solution and we have to refer to numerical methods to maximize the $\log \mathcal{L}$ or minimize the negative of the previous relation.

- The uncertainty on the paraneters can be better estimated by Monte Carlo methods.



### Significance level and p-value
***

- The probability of the observed value is called p-value. 

    - A low p value means that the observation is unlikely to occur under the condition that the null hypothesis holds true. 

    - When the p value is lower than significance level, then we reject the null hypothesis. 
"""

# ╔═╡ adec1076-9584-4620-b76b-95cc61d05ed2
# ╠═╡ show_logs = false
md"""
$(LocalResource("Pics/stdci.jpg"))

$(LocalResource("Pics/stdci2.png"))

"""

# ╔═╡ 680e6fe2-6bb6-4cae-8289-f634b9a8618e
md"""
### The bootstrap method
***

- The idea is rather simple. Let's assume to have a dataset $\{x_i\}, i=1...N$, and define any function $T(x_i)$ of the measurements.
- Be $T$ the value of the function computer for our dataset, and $T_j$ the value of the fuction computed for any nubmer of subsamples built drawing $n$ data from the oginal dataset with *replacement*, i.e. single value can be picked up multiple times.
- The the variance on $T$, computed for the original dataset, if we have drawn $m$ subsamples, is:

```math
\sigma^2_T = \frac{1}{m} \sum_{j=1}^m (T_j-T)^2
```

- This is the simplest variant of boostrap (there are many...), and assumes again statistical independence of the data.
"""

# ╔═╡ fcfedd22-8c34-40ba-9507-ac6508b35250
begin
	# Let's define a dataset
	dataset = [16,11,15,16,11,12,18,5,3,1]
	
	# Let's compute the mean and the median
	
	μdt = mean(dataset)
	
	μ½dt = median(dataset) 
	
	#printfmtln("μ = {:.2f} and μ½ = {:.2f}", μdt, μ½dt)
	
	# Number of resampling
	n_boot = 1000
	
	bs1 = bootstrap(mean, dataset, BasicSampling(n_boot))
	bs2 = bootstrap(median, dataset, BasicSampling(n_boot))
	
	cil = 0.68;
	
	## basic CI
	bci1 = confint(bs1, NormalConfInt(cil))
	bci2 = confint(bs2, NormalConfInt(cil))
	
	#printfmtln("1σ uncertainty for the mean: {:.2f}", (bci1[1][3]-bci1[1][2])/2)
	#printfmtln("1σ uncertainty for the median: {:.2f}", (bci2[1][3]-bci2[1][2])/2)
	
end;

# ╔═╡ 57f7d2c8-b34a-4a27-b59d-147e8b73d124
Markdown.parse("""
##### μ =  $(latexify(μdt,fmt="%.2f"))

##### 1σ uncertainty for the mean = $(latexify((bci1[1][3]-bci1[1][2])/2,fmt="%.2f"))

##### 1σ uncertainty for the median = $(latexify((bci2[1][3]-bci2[1][2])/2,fmt="%.2f"))
""")

# ╔═╡ 5b2118eb-f0bc-43a5-ae7b-8a8133cf7f5d
md"""
- The uncertainty on the median is always larger than the one on the mean ($\sigma_{1/2} \approx \frac{4}{3}\sigma$).
- However, the median is much less sensititve to outliers.
"""

# ╔═╡ 330c1d7e-c254-4501-b6b9-0409f581ea1c
md"""
## (Frequentist) hypothesis testing
***

- In hypothesis testing, a set of complementary hypotheses is proposed, which consists of a null hypothesis and an alternative hypothesis. 

- When conducting the hypothesis testing, we choose to believe that the null hypothesis holds true. 
    - If the observed value is likely to occur under the condition that the null hypothesis is true, then we do not reject the null hypothesis. 
    - However, if the observed value is unlikely to occur, then we reject the null hypothesis and accept the alternative hypothesis.
"""

# ╔═╡ 6a67e4bb-8fcf-44fc-94aa-d40584baca0f
# ╠═╡ show_logs = false
md"""
$(LocalResource("Pics/HT.png"))
"""

# ╔═╡ c9dd8081-ac3e-446e-9a99-5aa28d22cfc2
md"""
- In order to conduct hypothesis testing, we need to define a significance level. Significance level determines the level that we want to believe in the null hypothesis. 

    - If we set the significance level as 0.05, then as long as the probability of the observation is higher than 5%, we do not reject the null hypothesis. 

    - However, if the probability of the observation falls below 5%, we reject the null hypothesis and accept the alternative hypothesis. 

- There is a tradeoff between the Type I and Type II error. Basically, a higher significance level makes it easier to reject the null hypothesis. Although in this way, a higher significance level reduces the Type II error, it also results in a higher Type I error at the same time. 

- The only way to reduce both Type I and Type II error is by increasing the sample size...




### The $\chi^2$ distribution
***

- We need to define the $\chi^2$ distribution.

    - If we carry our $N$ measurements $\{x_i\}$ of a random variable $\textbf{x}$ with variance $\sigma^2 = 1$ and mean $\mu = 0$, then the sum: $\sum_{i=1}^N x_i^2$ follows the $\chi^2$ distribution with $N$ degrees of freedom.
    
"""

# ╔═╡ c285a4b1-9327-4a79-bbb2-42ef0e22500f
cm"""
``\chi^2`` degrees of freedom: $( @bind chi1 PlutoUI.Slider(2:40, default=20) ) 
"""

# ╔═╡ 16f4655d-2257-496b-9726-b0e12ca545b2
begin
	fg6 = Figure()
	ax1fg6 = Axis(fg6[1, 1],
	    title = L"\chi^2"
	)
	
	rngfg6 = range(start=0,stop=50,step=0.1)
	
	chi2Distribution = Chisq(5)
	pdfpl = pdf(chi2Distribution,rngfg6)
	lines!(rngfg6,pdfpl,label=L"\chi^2_5")
	#
	chi2Distribution = Chisq(chi1)
	pdfpl = pdf(chi2Distribution,rngfg6)
	lines!(rngfg6,pdfpl,label=L"χ^2_{%$chi1}")
	
	
	axislegend()
	
	fg6
end

# ╔═╡ b384465f-b1d8-4deb-926c-76d48e4e9dda
md"""
- The mean of $\chi^2_N = N$, although often one works with the *reduced* $\chi^2$, i.e. $\chi^2_\nu \equiv \chi^2 / N$ that has mean value $1$.


### Straight line fitting
***

- Let's study a few examples fitting a straigth line $y = ax + b$. $Y = \frac{y_i - (ax_i +b)}{\sigma_{y_i}} $ should be normally distributed with $\mu = 0$ and $\sigma^2 = 1$.

- Hence: $\chi^2 = \sum_{i=1}^N (\frac{y_i - (ax_i +b)}{\sigma_{y_i}})^2 $ will follow a $\chi^2$ distribution with $N-2$ degrees of freedom (one estimates two free parameters from the $N$ datapoints).
"""

# ╔═╡ ef2091c7-17dd-43f9-b950-8948ab7a0d0c
begin
	# Lest's define a function to fit and a chi2 to minimize
	
	f(x;a=1,b=0) = a.*x .+ b
	
	χ2(prs) = sum((f(x,a=prs[1],b=prs[2]) .- y).^2 ./ σ.^2)
	
	
	fg7 = Figure(size=(1000,400))
	
	
	# case 1
	x = [0.96,1.95,2.93,3.98,4.97,6.07,7.02,8.06,8.94,10.05]
	y = [-3.99,25.48,52.17,49.88,44.745,65.00,74.76,69.23,103.26,100.37]
	σ = [9.53,10.12,9.46,8.86,9.80,9.56,9.69,9.35,9.97,9.48]
	
	x0 = [10.0, 5.0]
	res = optimize(χ2, x0)
	
	prs1 = Optim.minimizer(res)
	χ21 = χ2(prs1)
	c21 = format(χ2(prs1),precision=2)
	c2r1 = format(χ2(prs1)/(length(x)-2),precision=2)
	title = latexstring("\\chi^2="*c21*",\\ \\chi^2_\\nu="*c2r1)
	
	ax1fg7 = Axis(fg7[1, 1],
	    title = title
	)
	
	scatter!(x,y,color=:blue)
	errorbars!(x,y,σ,color=:blue)
	lines!(0:12,f(0:12,a=prs1[1],b=prs1[2]),color=:blue)
	
	xlims!(0,12)
	ylims!(-20,120)
	
	
	# case 2
	x = [0.94,1.89,2.94,3.99,4.96,5.98,7.01,8.09,9.0,10.0]
	y = [10.08,21.47,43.07,90.58,101.73,137.73,180.6,242.09,289.55,354.8]
	σ = [8.57,8.9,9.75,11.88,11.19,11.05,10.83,11.09,11.66,10.92]
	
	x0 = [10.0, 5.0]
	res = optimize(χ2, x0)
	
	prs2 = Optim.minimizer(res)
	χ22 = χ2(prs2)
	c22 = format(χ2(prs2),precision=2)
	c2r2 = format(χ2(prs2)/(length(x)-2),precision=2)
	title = latexstring("\\chi^2="*c22*",\\ \\chi^2_\\nu="*c2r2)
	
	ax1fg7 = Axis(fg7[1, 2],
	    title = title
	)
	
	scatter!(x,y,color=:blue)
	errorbars!(x,y,σ,color=:blue)
	lines!(0:12,f(0:12,a=prs2[1],b=prs2[2]),color=:blue)
	
	xlims!(0,12)
	ylims!(0,400)
	
	
	# case 3
	x = [1.02,2.0,3.0,3.92,5.02,5.97,7.05,8.03,9.03,9.97]
	y = [5.34,24.13,32.08,45.38,56.59,67.88,72.01,85.41,93.09,102.65]
	σ = [8.93,9.87,10.14,9.65,9.7,10.13,9.59,9.7,9.39,9.63]
	
	x0 = [10.0, 5.0]
	res = optimize(χ2, x0)
	
	prs3 = Optim.minimizer(res)
	χ23 = χ2(prs3)
	c23 = format(χ2(prs3),precision=2)
	c2r3 = format(χ2(prs3)/(length(x)-2),precision=2)
	title = latexstring("\\chi^2="*c23*",\\ \\chi^2_\\nu="*c2r3)
	
	ax1fg7 = Axis(fg7[1, 3],
	    title = title
	)
	
	scatter!(x,y,color=:blue)
	errorbars!(x,y,σ,color=:blue)
	lines!(0:12,f(0:12,a=prs3[1],b=prs3[2]),color=:blue)
	
	xlims!(0,12)
	ylims!(0,120)
	
	
	fg7
end

# ╔═╡ 8c0d3dec-2385-426c-9304-36299356f10b
md"""
- Inspecting results visually, we migght state that:
    - **case 1** is an acceptable fit, residuals looks randomly distributed and of the "right size" (i.e. if errors are at $1\sigma$, roughly 30% of the points should be farther than $1\sigma$ from the fit.
    - **case 2** is clearly unacceptable. Residuals look "big" and definitely not randomly distributed.
    - **case 3** shows very small residuals, but this is not a signature of a good fit. Typically, this means errors are oversestimated (or points are not statistically independent).
    
- We can formalize part of this visual feeling computing the probability, based on the $\chi^2$ distribution, to have residuals at least as large as, or larger, than measured. 
    - This is just the integral of the $\chi^2$ distribution with the given degrees of freedom from the obtained $\chi^2$ value to infinite. 
"""

# ╔═╡ bd6c2b2f-8e1c-46fc-a6ff-cd3381dff002
begin
	# The same in all cases
	dof = length(x)-2
	
	# case 1
	p121 = Frequentist_p_value(χ21,dof)
	    
	# case 2
	p222 = Frequentist_p_value(χ22,dof)
	
	# case 3
	p323 = Frequentist_p_value(χ23,dof)
	
	#println("Probability:")
	#printfmtln("\tCase 1: {:.3g}", p121)
	#printfmtln("\tCase 2: {:.3g}", p222)
	#printfmtln("\tCase 3: {:.3g}", p323)
	
end;

# ╔═╡ a23d9857-5b1a-44aa-9846-2f9ce5b8b8ca
Markdown.parse("""

#### Probability

##### - Case 1:  $(latexify(p121,fmt="%.3g"))

##### - Case 2:  $(latexify(p222,fmt="%.3g"))

##### - Case 3:  $(latexify(p323,fmt="%.3g"))
""")

# ╔═╡ 1ba7dcb5-8fc3-4e49-b790-667542020d5b
md"""
- These results can be read saying that in **case 1** there is about 13% probability to get residuals as large as or larger than measured. In **case 2** this probability is negligible and for **case 3** is very high. 

- **Case 1**, depending on the threshold one may define, is likely an acceptable fit. **Case 2** means that the model is highly unadequate, and **case 3** means that residuals are so low to be very suspicious about the input data.
"""

# ╔═╡ fb2fa3d1-e471-4f61-849a-24cf280b7085
md"""
## The Kolmogorov-Smirnov (K-S) test
***

- It is a general test to determine if two datasets "differ" in a statistical sense, i.e. if they can be drawn from the same parent distribution.

- It can also be used to check the concistency of a dataset with a parent distribution.

- It is very general, since there is no assumption about the shape of the population.
"""

# ╔═╡ b1d712fa-2e5a-4995-9251-a2a51dac639a
begin
	# Let's define two datasets
	
	A = [9.5,11.4,16.1,8.4,9.4,7.7,10.6,5.7,7.7,10.3,12.0,5.2,8.4,11.9,7.3,12.2,8.6,10.4,10.6,8.5]
	
	B = [10.2,9.5,10.2,8.6,8.2,10.3,13.8,12.8,11.2,10.0,6.9,10.2,9.0,9.5,10.9,9.6,9.5,11.1,14.8,9.6]
	
	
	fg8 = Figure()
	ax1fg8 = Axis(fg8[1, 1],
	    ylabel = "N",
	    title = "A vs B histogram"
	)
	
	hist!(A,color=(:blue,0.5),label="A",bins=10)
	hist!(B,color=(:orange,0.5),label="B",bins=10)
	
	xlims!(0,20)
	
	axislegend()
	
	fg8
end

# ╔═╡ 4a9ec547-048a-434b-a369-598469fa554b
# ╠═╡ show_logs = false
md"""
- So, are the two distributions consistent?

- The KS compares the two cumulative distributions and measures the maximum "vertical" distance:

$(LocalResource("Pics/KS.png"))

- The KS test p-value measures the probability that a given distance can arise by chance.
"""

# ╔═╡ dd3b2aec-9207-4a0d-a103-6284a5c8439f
ApproximateTwoSampleKSTest(A,B)

# ╔═╡ 21bcc13f-08e8-425b-b9bc-6c1b2b358f56
md"""
- And we cannot reject the hypothesis that the two datasets are drawn from the same distribution.
"""

# ╔═╡ d7685e51-6e8d-4b44-a7ab-de107de115d3
md"""
## Reference & Material
***

Material and papers related to the topics discussed in this lecture.

- [E. Feigelson & G.J. Babu - "Modern Statistical Methods for Astronomy”](https://www.cambridge.org/core/books/modern-statistical-methods-for-astronomy/941AE392A553D68DD7B02491BB66DDEC)
- [S.H. Chan - "Introduction to Probability for Data Science"](https://probability4datascience.com/index.html)
"""

# ╔═╡ 4d5143ca-835d-4cf6-80a6-2d242812ef9d
md"""
### Credits
***

This notebook contains material obtained by [https://towardsdatascience.com/fundamental-statistics-7770376593b and https://dk81.github.io/dkmathstats_site/prob-inverse-cdf.html#:~:text=The%20Inverse%20CDF%20Method%20allows,x%E2%88%92ab%E2%88%92a.](https://towardsdatascience.com/fundamental-statistics-7770376593b and https://dk81.github.io/dkmathstats_site/prob-inverse-cdf.html#:~:text=The%20Inverse%20CDF%20Method%20allows,x%E2%88%92ab%E2%88%92a.)
"""

# ╔═╡ 7d3b307c-9404-4f8f-9f2d-3fe2199bdad3
cm"""
## Course Flow
***

<table>
  <tr>
    <td>Previous lecture</td>
    <td>Next lecture</td>
  </tr>
  <tr>
      <td><a href="./open?path=Lectures/Lecture - Introduction/Lecture-Introduction.jl">Introduction</a></td>
    <td><a href="./open?path=Lectures/Lecture - Statistics Reminder/Lecture-BayesianReminder.jl">Reminder of Bayesian statistics</a></td>
  </tr>
 </table>


"""

# ╔═╡ f2380288-5372-4199-93f5-5582878681cc
md"""
**Copyright**

This notebook is provided as [Open Educational Resource](https://en.wikipedia.org/wiki/Open_educational_resources). Feel free to use the notebook for your own purposes. The text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/), the code of the examples, unless obtained from other properly quoted sources, under the [MIT license](https://opensource.org/licenses/MIT). Please attribute the work as follows: *Stefano Covino, Time Domain Astrophysics - Lecture notes featuring computational examples, 2025*.
"""

# ╔═╡ Cell order:
# ╟─b95f212e-a904-425b-ab72-3a2a62a36051
# ╟─cb5178e4-ca00-4d0a-8ae2-abc8aba79edc
# ╠═d1d6c1fa-d6f2-4ab5-9387-57d141b81fe3
# ╠═90fa08da-6fb6-4a7a-ab84-82db43589cf6
# ╠═c97d6e21-0c4c-4d79-8193-abc8e3001d0c
# ╟─782247c0-5683-45be-83dc-c640fc5af519
# ╟─7609a27d-dcaf-400a-b07a-b2ce79d46546
# ╟─2bcfe923-1fa1-4faa-96a4-4493ab10f607
# ╟─c3183c6f-74bb-4ad6-9e9c-3c4f3505a472
# ╟─aec2e7b5-d07b-4246-82bb-f1b6867261fe
# ╟─ee09bcaa-e455-44b6-9d77-26e73e3942e3
# ╟─8d7236e1-452c-4623-a6dd-756c5eb53f5d
# ╟─b9462e27-1bc9-4d92-ac01-efb604fe8eff
# ╟─5f6217f3-7c8b-4feb-8150-af67660fe6a4
# ╟─85b2cb68-9d2b-484a-b979-0b81de743178
# ╟─f3af6e72-7146-4eea-951f-37ee2f08cbff
# ╟─fbdfc6c9-dee0-43f2-8c50-a11ff74eafe6
# ╟─6a4cba4f-bf8a-497e-98ae-8b32503dc983
# ╟─3b6a9f06-7bca-4745-8cce-c878462fd943
# ╟─3879ebfd-2c2c-4339-bca4-4a8f09a7d10d
# ╟─5ada7105-3a68-48a6-a559-a136e76584a8
# ╟─273d70f2-6db2-42a9-bbc6-76c94d2d7f11
# ╟─b18909e0-205e-40ad-ab87-7985d986cb9d
# ╟─88012fd0-476a-46eb-9375-202ceccfc953
# ╠═abac4c88-2feb-463d-9bf3-48c56029548c
# ╟─6f8f4451-e403-4f22-a10b-64684a790fac
# ╟─4ca1d27b-6d94-4786-a429-40008570f48a
# ╟─6870814c-7eda-419a-86bc-bc8b4335b683
# ╟─1e8ca5b8-a7d3-40f4-8683-d7a286f03ce5
# ╟─c33a250d-c60a-4608-b74b-816a9770964a
# ╟─6d7daebf-9ec3-462e-bafc-17e3d36201ec
# ╠═8359d0d5-28cd-4afc-84e9-711b1da84d1f
# ╠═4a59ebc1-2cdb-44a3-9088-f6bc96e733aa
# ╠═3e195d76-4ce2-4cc7-9106-fe0e7f728ed8
# ╟─878d2155-a4fe-4b51-9417-879aa259b52f
# ╠═468e8a03-a11d-48ee-9699-1d10004363d2
# ╟─da6b95d4-a640-483e-936a-45a8aee4f4cb
# ╠═f68bc201-d4d6-45a3-8ead-bf767276f657
# ╠═b0ecbb26-0822-4c56-8eaa-605f8f6cd59e
# ╟─1a80b7ea-4515-44c4-9476-ba9b6eac30c5
# ╟─b3163979-4422-4c47-b0f5-049c091ce2da
# ╟─ea34b5a9-0937-4cbc-9afd-79055556e640
# ╟─2c20004a-4929-401e-b653-23b471d9a487
# ╟─657c620c-b79f-4fb2-b78e-84c328d2e8ed
# ╟─dd1f7837-f591-4b45-951e-8d1eb2664c8c
# ╟─47347aea-bcd9-4329-89e0-0c2020df16fc
# ╟─ed6701f4-9656-4f07-84c6-8e634cecea98
# ╟─03744ccd-7a2b-41e7-a10f-ee1b9c0013c9
# ╟─8dc68c2e-d1f0-475a-9c7f-d06210d09943
# ╟─823f9f4c-0c19-40b6-bab0-4b5832d15a6d
# ╠═ea9d2ebc-67b3-4945-b8ba-7598e5073b17
# ╟─4bdf78e5-9c93-4727-b662-483ca9d2012a
# ╟─1ac335ee-257a-40b3-8357-33466b1b1cbc
# ╟─73b91ae5-e2ec-45c8-be23-7aed9a9e8b15
# ╟─5c946d0d-81f8-43f3-a0ab-196ea9423021
# ╠═813ab26f-b80e-4fc2-8d95-357ccac051ed
# ╠═6cdb0484-7ffc-41e3-8332-b5486ed5bd18
# ╟─f8c8b403-0684-4263-8aba-3e2f9c99f37a
# ╟─adec1076-9584-4620-b76b-95cc61d05ed2
# ╟─680e6fe2-6bb6-4cae-8289-f634b9a8618e
# ╠═fcfedd22-8c34-40ba-9507-ac6508b35250
# ╟─57f7d2c8-b34a-4a27-b59d-147e8b73d124
# ╟─5b2118eb-f0bc-43a5-ae7b-8a8133cf7f5d
# ╟─330c1d7e-c254-4501-b6b9-0409f581ea1c
# ╟─6a67e4bb-8fcf-44fc-94aa-d40584baca0f
# ╟─c9dd8081-ac3e-446e-9a99-5aa28d22cfc2
# ╟─c285a4b1-9327-4a79-bbb2-42ef0e22500f
# ╠═16f4655d-2257-496b-9726-b0e12ca545b2
# ╟─b384465f-b1d8-4deb-926c-76d48e4e9dda
# ╠═ef2091c7-17dd-43f9-b950-8948ab7a0d0c
# ╟─8c0d3dec-2385-426c-9304-36299356f10b
# ╠═bd6c2b2f-8e1c-46fc-a6ff-cd3381dff002
# ╟─a23d9857-5b1a-44aa-9846-2f9ce5b8b8ca
# ╟─1ba7dcb5-8fc3-4e49-b790-667542020d5b
# ╟─fb2fa3d1-e471-4f61-849a-24cf280b7085
# ╠═b1d712fa-2e5a-4995-9251-a2a51dac639a
# ╟─4a9ec547-048a-434b-a369-598469fa554b
# ╠═dd3b2aec-9207-4a0d-a103-6284a5c8439f
# ╟─21bcc13f-08e8-425b-b9bc-6c1b2b358f56
# ╟─d7685e51-6e8d-4b44-a7ab-de107de115d3
# ╟─4d5143ca-835d-4cf6-80a6-2d242812ef9d
# ╟─7d3b307c-9404-4f8f-9f2d-3fe2199bdad3
# ╟─f2380288-5372-4199-93f5-5582878681cc
